Statistical machine translation and its challenges
نویسنده
چکیده
In addition to speech recognition and syntactic parsing, during the last 10 years, the statistical approach has found widespread use in machine translation of both written language and spoken language. In many comparative evaluations, the statistical approach was found to be competitive or superior to the existing conventional approaches. Since the first statistical approach was proposed at the end of the 80s, many attempts have been made to improve the state of the art. Like other natural language processing tasks, machine translation requires four major components: a decision rule, a set of probability models, a training criterion and an efficient generation of the target sentence. We will consider each of these four components in more detail and point out promising research directions.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملEuroparl: A Parallel Corpus for Statistical Machine Translation
We collected a corpus of parallel text in 11 languages from the proceedings of the European Parliament, which are published on the web1. This corpus has found widespread use in the NLP community. Here, we focus on its acquisition and its application as training data for statistical machine translation (SMT). We trained SMT systems for 110 language pairs, which reveal interesting clues into the ...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملSix Challenges for Neural Machine Translation
We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search. We show both deficiencies and improvements over the quality of phrasebased statistical machine translation.
متن کاملImages as Context in Statistical Machine Translation∗
This paper reports ongoing experiments towards exploiting the use of images to provide additional context for statistical machine translation (SMT). We investigate whether this contextual information can be helpful in targeting two well-known challenges in machine translation: ambiguity (incorrect translation of words that have multiple senses) and out-of-vocabulary words (words left untranslat...
متن کامل